Granulating Data On Non-Scalar Attribute Values

نویسندگان

  • Lawrence Mazlack
  • Sarah Coppock
چکیده

Data mining discouvers interesting information from a data set. Mining incorporates different methods and considers different kinds of in formation. Granulation is an important aspect o f mining. The data sets can be extremely large with multiple kinds of data in high dimensionality. Without granulation, large data sets often are computationally infeasible; and, the generated results may be overly fine grained. Most available algorithms work with quantitative data. However, many data sets contain a mixture of quantitative and qualitative data. Our goal is to group records containing multiple data varieties: quantitative (discrete, continuous) and qualitative (ordinal, nominal). Grouping based on different quantitative metrics can be difficult. Incorporating various qualitative elements is not simple. There are partially successful strategies as well as several differential geometries. We expect to use a mixture of scalar methods and sof t computing methods (rough sets, fuzzy sets), as well as methods using other metrics. To cluster whole records in a data set, i t would be useful to have a general similarity metric or a set of integrated similarity metrics that would allow record to record similarity comparisons. There are methods to granulate data i tems belonging to a single attribute. Few methods exist that might meaningfully handle a combination of many data varieties in a single metric. This paper is an initial consideration of strategies for integrating multiple metrics in the task of granulating records. GROUPING RECORDS TOGETHER Granulation helps data mining accomplish: association rule discouvery, classification, partitioning, clustering, and sequence discouvery. Without granulation, large data sets often are computationally infeasible; and, the generated results may be overly fine grained. Data mining a data set composed of varied data stored in records can focus on either: granulating individual attributes, data extracted from the records making up a data set, or whole records. To cluster whole records in a data set, it would be useful to have a general similarity metric that would allow record to record similarity comparison. There are methods to granulate data items belonging to a single attribute. Unfortunately, few methods exist that meaningfully account for a combination of many data varieties. Clustering groups objects into clusters so that the similarity among objects within the same cluster (intra-cluster similarity) is maximized and the similarity between objects in different clusters (inter-cluster similarity) is minimized. Clustering increases granule size and is useful in data mining. Clustering can discouver the general data distribution; and, aid in the discouvery of similar objects described in the data set. A good characterization of the resulting clusters can also be a valuable data mining product. There are two types of hierarchical approaches to clustering: agglomerative and divisive. Agglomerative begins with all objects in their own cluster and combines clusters together for which the similarity is the largest. This is done repeatedly until all objects are in the same cluster. Conversely, divisive begins with all objects in the same cluster and does the reverse. Because most well understood approaches use a similarity metric, an similarity appropriate metric or a way to integrate diverse metrics is desirable. Any mix of data varieties without losing the meaning behind the metric’s is necessary. Another approach to grouping records is partitioning. Sometimes, the term partitioning is used as if synonymous with clustering. However, partitioning can also be approached as a purification process (Coppersmith, 1999) where partitions progressively become more pure. Increasing granule of small partitions is then a matter of relaxing partition boundaries through either rough sets or fuzzy values. A data set can have millions of records with hundreds of attributes. The attributes may have many disparate kinds of data. Some algorithms offer promise in handling multiple kinds of data. Unfortunately, they are not scalable as their complexity is geometric. They are only useful for small data sets. In addition, some approaches lose the meaning of the metric when trying minimize algorithmic complexity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Database Model for Medical Consultation

The database model presented in this paper is suitable for application in which queries may require non-crisp references to certain attributes. The data item (attribute) values may be crisp or fuzzy. For instance, such adjectives as 'high' or 'normal' may be attribute values for the attribute blood pressure. A disease or a condition can be described by a number of symptoms which may be crisp al...

متن کامل

A Hybrid Multi-attribute Group Decision Making Method Based on Grey Linguistic 2-tuple

Because of the complexity of decision-making environment, the uncertainty of fuzziness and the uncertainty of grey maybe coexist in the problems of multi-attribute group decision making. In this paper, we study the problems of multi-attribute group decision making with hybrid grey attribute data (the precise values, interval numbers and linguistic fuzzy variables coexist, and each attribute val...

متن کامل

Trapezoidal intuitionistic fuzzy prioritized aggregation operators and application to multi-attribute decision making

In some multi-attribute decision making (MADM) problems, various relationships among the decision attributes should be considered. This paper investigates the prioritization relationship of attributes in MADM with trapezoidal intuitionistic fuzzy numbers (TrIFNs). TrIFNs are a special intuitionistic fuzzy set on a real number set and have the better capability to model ill-known quantities. Fir...

متن کامل

Interval MULTIMOORA method with target values of attributes based on interval distance and preference degree: biomaterials selection

A target-based MADM method covers beneficial and non-beneficial attributes besides target values for some attributes. Such techniques are considered as the comprehensive forms of MADM approaches. Target-based MADM methods can also be used in traditional decision-making problems in which beneficial and non-beneficial attributes only exist. In many practical selection problems, some attributes ha...

متن کامل

Singlet scalar dark matter in noncommutative space

In this paper, we examine the singlet scalar dark matter annihilation to becoming the Standard Model particles in the non-commutative space. In the recent decades, many candidates of dark matter have been offered,  but our information about  the nature of dark matter is still limited. There are such particle candidates as  scalar matetr, fermion, boson, gauge boson, etc.; however, they have nei...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002